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Abstract 

This paper develops information geometric representations for non¬ 
linear filters in continuous time. The posterior distribution associ¬ 
ated with an abstract nonlinear filtering problem is shown to sat¬ 
isfy a stochastic differential equation on a Hilbert information mani¬ 
fold. This supports the Fisher metric as a pseudo-Riemannian metric. 

Flows of Shannon information are shown to be connected with the 
quadratic variation of the process of posterior distributions in this 
metric. Apart from providing a suitable setting in which to study 
such information-theoretic properties, the Hilbert manifold has an 
appropriate topology from the point of view of multi-objective fil¬ 
ter approximations. A general class of finite-dimensional exponential 
filters is shown to fit within this framework, and an intrinsic evolution 
equation, involving Amari’s —1-covariant derivative, is developed for 
such filters. Three example systems, one of infinite dimension, are 
developed in detail. 
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1 Introduction 


Let {Xt G X, t > 0) be a Markov “signal” process taking valnes in a metric 
space X, and let (Yt G t > 0) be an “observation” process defined by 

Yt= [\s{Xs)ds + Bt, ( 1 ) 

Jo 

where h : [0, oo) x X —)• is a Borel measurable function, and {Bt G 

t > 0) is a d-vector Brownian motion, independent of X. In this context, 
the problem of nonlinear filtering is that of estimating Xt, at each time t, 
from the observations available up to that time, (X^, s G [0,f]). In order to 
compute various optimal estimates of Xt, such as the maximum a-posteriori 
probability estimate (if X is discrete) or the minimum mean-square-error es¬ 
timate (if X is a normed linear space), it is usually necessary to find, or at 
least to approximate, the entire observation-conditional distribution of Xt. 
That a regular version of such a distribution exists, and can be represented 
by an abstract version of Bayes’ formula, is one of the important early de¬ 
velopments in the subject [m. However, starting with the work of Wonham 
[26] and Shiryayev [2l], much of the theory of nonlinear filtering concerns 
recursive filtering equations, in which representations of the posterior distri¬ 
bution are shown to satisfy particular stochastic differential equations. The 
reader is referred to [6] for a wide range of articles on the theory and current 
practice of the subject. 

Recursive filtering equations are typically expressed in ways that are spe¬ 
cific to the nature of the signal space X. If, for example, X is discrete, then 
the filter can be expressed as a stochastic ordinary differential equation for 
the vector of posterior probabilities of the individual states, x G X [261121]; 
whereas, if X is a multidimensional diffusion process, then the filter can be 
expressed as a stochastic partial differential equation for the posterior density 
|T2] . One of the aims of this paper is to unify such results through the use of 
a filter “state space” that is based on estimation theoretic constructs rather 
than the underlying topology of X. The state space used is a Hilbert manifold 
of probability measures on X. This has an appropriate topology for the study 
of both approximation errors and information theoretic properties. These no¬ 
tions are discussed next in the context of an abstract Bayesian problem, in 
which the estimand H H —)■ U and observation H H —)■ V are defined 
on a common probability space (H, X, P), and take values in measure spaces 
{\J,U,\u) and (V, V, Ay), respectively. We assume that Puv ^ Pu ® Ay, 
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where Puv is the joint distribution of {U, V), and Pu is the marginal of U. 
Let V{U) be the set of probability measures on W, let Ry : U x V ^ [0, oo) 
be a measurable function, for which dPuv = Rvd{Pu ® Ay), and let hi' := 
{cj G n : 0 < Rv{u,V{u))Pu{du) < oo}. Then hi' G P, P(r2') = 1, and 
Pu\v ■ dehned by 


Pu\v{A) = lo' 


J^Rv{u,V)Pu{du) 

J^Rv{u,V)Puidu) 




( 2 ) 


is a regular V-conditional distribution for U. (See [15] for details.) 

In many applications of Bayesian estimation, including nonlinear hltering, 
it is not possible to express Pu\v in terms of a hnite number of statistics, and 
so it is useful to construct approximations: P : hi —)■ Q C ViU), where P{A) 
is I/-measurable for all A, and Q is of hnite dimension. Single estimation 
objectives, such as minimum mean-square error in the estimate of a real¬ 
valued quantity /(P), induce their own specihc measures of approximation 
error on ViU). On the other hand, if / is sufficiently regular, a more generic 
measure of error such as the LF' metric on densities may be useful. If Xu is a 
probability measure, and Pu\v{^) and P{uj) have densities pu\v{^) and pioj) 
with respect to Xu, then the difference between the minimum mean-square 
error estimate of f{U) and the mean of / under P{pj) can be bounded by 
means of the Cauchy-Schwartz inequality: 

(P‘Pu\viAf ~ - P‘^uf‘^P‘^uiPu\v{i^) - p{uj)Y. (3) 


Although, in this context, the metric on densities bounds the estimation 
error, it may still be poor in practice. This is so, for example, if / is the 
indicator function of a rare, but important, event. Moreover, we often need 
generic measures of error that are suitable for a variety of objectives. This 
is especially important if the underlying estimation problem is inherently 
multi-objective. 

Multi-objective measures of approximation error are discussed in [21]. 
One such measure is the Kullback-Leibler (KL) divergence (or “relative en¬ 
tropy”): 

1,(0 I P( .= / log § if 0 « (4> 

I • I otherwise. ^ ^ 

This is widely used in variational Bayesian estimation. (See, for example, 
[25]. j Apart from its use as a measure of approximation error, the KL- 
divergence plays a central role in Shannon information theory. The mutual 
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information between U and V is defined as follows |S] : 


I{U- V) := V{Puv I Pu O Pv) = ^P{Pu\v I Pu)- (5) 


The term V{Pu\v\Pu), here, can be interpreted as the information gain of 
the posterior distribution Pu\v over the prior Pu- 

Suppose that W : —)■ W is a second observation taking values in a 
measure space (W, W, Avk) such that V and W are [/-conditionally inde¬ 
pendent and Puw Pu ^w- Let Rw : U x W — )■ [0, cxd) be a measur¬ 
able function for which dPuw = Rwd{Pu ® \w)) and let := {cu G f/ : 
0 < Rw{u,W{uj))Pu\vi^){du) < oo}. Then G P, P(f2") = 1, and 
Pu\vw ■ ^ Pip()i defined by 


Pu\vw{^) — lo" 


Ia Pw{u, W)Pu\v{du) 
fu Pw(u, W)Puiv(du) 




( 6 ) 


is a regular (V, H/)-conditional distribution for U. The mutual information 
/([/; (V, W)) can be decomposed in the following way, 


HU- {y, IT)) = /([/; V) + E/(f/; W\V), (7) 


where I{U]W\V) is the V-conditional mutual information between U and 
IT, 


/([/; W\V) V{Puw\v\Pu\v ® Pw\v) — IE (V{Pu\yY/ \ Pu\v) I L") . (8) 

(The conditional mutual information is sometimes defined as the average 
value of this quantity [5].) Equation (|6]) can be used recursively in estima¬ 
tion problems having sequences of conditionally independent observations. 
The information extracted from each observation in the sequence is then 
associated with a “local” Bayesian problem, in which earlier observations 
enter only through the “local prior” (the posterior derived from the earlier 
observations). The decomposition ([7]), ([8]) is valid whether or not V and 
IT are [/-conditionally independent. However, in the absence of such condi¬ 
tional independence it is not possible to interpret V{Pu\vw I Pu\v) as a local 
information gain in this way. 

Let [ > 0) be as described at the start of this section, and 

consider the problem of estimating the path of X from Y. For any 0 < f < 
s < oo, let 

Y‘:=(Yr-Y„relt,s]). (9) 
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Then and y/ are X-conditionally independent, and we can use the above 
methodology to identify the Yghconditional mutual information between X 
and Yf, I{X]Yf\YQ). This is shown in section 13.11 to be related to the 
quadratic variation of the posterior distribution in the Fisher metric. The 
latter is dehned in terms of the mixed second derivative of the KL-divergence, 
and so an appropriate state space for the nonlinear hlter in this context is 
a subset of V{X) having a differentiable structure with respect to which 
the KL-divergence admits such a derivative. This is also desirable for the 
assessment of approximation errors. Information Geometry is the study of 
sets of probability measures having such structures. 

Information geometry is applied to nonlinear hltering in [2] and the ref¬ 
erences therein. The posterior distributions for diffusion signal processes are 
assumed, there, to have densities with respect to Lebesgue measure, whose 
square-roots satisfy stochastic differential equations in B™, Leb). The 

induced distance function between probability measures is the Hellinger dis¬ 
tance. The coefficients of the filtering equation are projected in this sense 
onto the tangent spaces of hnite-dimensional exponential models, in order to 
obtain approximations to filters. Information theoretic justification is given 
(when restricted to tangent vectors corresponding to differentiable curves 
of square-root probability densities, the norm corresponds to the Fisher 
metric), and comparisons are made with other methods such as moment 
matching. Although suitable for this purpose, the Hellinger space cannot be 
used as an infinite-dimensional statistical manifold since the KL-divergence 
is discontinuous at every point of it. (See the discussion at the end of section 
2 in [20].) Furthermore, in common with the space of densities, it has a 
boundary, which can create problems with numerical methods. 

The local information gain of a nonlinear hlter is connected with the no¬ 
tion of entropy production in nonequilibrium statistical mechanics [TBl ITSj ITU] . 
The information geometric properties of nonlinear Liters are also, therefore, 
of interest in this context. 

The remainder of the paper is structured as follows. Section |2] outlines the 
main ingredients of information geometry and reviews the Hilbert manifold 
M, which is used extensively in the sequel. Section [3] outlines a general non¬ 
linear hltering problem, and expresses the associated process of conditional 
distributions as an Ito process on M. This allows the study of the quadratic 
variation of the hlter in the Fisher metric. An M-valued evolution equation 
is derived for a class of hnite-dimensional exponential Liters in section 01 The 
results of sections [3] and 0] are formulated in terms of a set of hypotheses. 
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some of which are not especially ripe. Section [S] develops three examples in 
which they are satished. Finally, section [H] makes some concluding remarks. 


2 Information Geometry 


We review the main ingredients of information geometry, by outlining the 
classical hnite-dimensional exponential model. This is also used in section |H 
Let (X, df,/!) be a probability space on which are dehned random variables 
i = l,...,n) with the following properties: (i) the random variables 
(1, ^ 1 , ^ 2 , • • •, in) represent linearly independent elements of i.e. /i(a + 

= 0) = 1 if and only if a = 0 and W 3 y = 0; (ii) < 

oo for all y in & non-empty open subset G C M"’. For each y ^ G, let Py be 
the probability measure on X with density 



where c{y) = log exp(^^ and let N := {Py : y G G}. It follows 
from (i) that the map G 3 y Py & N is a. bijection. Let 9 : N ^ 
G be its inverse; then {N, 6) is an exponential statistical manifold with an 
atlas comprising the single chart 6. We can think of a tangent vector at 
P E N as being an equivalence class of differentiable curves passing through 
P: two curves (expressed in coordinates), (y(t) E G : t E (—e,e)) and 
(z(t) E G : t E (—e,e)) being equivalent at P if y(0) = z(0) = 6{P) and 
y(0) = z(0). The tangent space at P, TpN, is the linear space of all such 
tangent vectors, and is spanned by the vectors {dp, i = 1,... ,n), where di is 
the equivalence class containing the curve {yi{t) := 9{P) +tei, t E (—e,e)), 
and ej is equal to the Kronecker delta. The tangent bundle is the disjoint 
union TN := UpgAr(P, TpN), and admits the global chart Q : T N ^ G xW^, 
where Q~^{y,u) := {9~^{y), Ndi). If a function f : N ^'Ef is differentiable, 
and U G TpN, then we write 


Uf = Ndif := N ^(/ o 9 ^){yi{t)) 




where {y,u) = Q{P,U) = {9{P),U9), and we have used the Einstein sum¬ 
mation convention, that indices appearing once as a superscript and once as 
a subscript are summed out. 
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According to the Eguchi relations [S], the mixed second derivative of the 
KL-divergence dehnes the Fisher metric as a Riemannian metric on N: for 
any P ^ N and any f/, R G TpN, 

{U,V)p-.= -UVV = g{P),juV, 

where U and V act on the hrst and second argument of P, respectively, and 

giP),j := (a„ dj)p = Epfe - EpfOte “ Ep^), (10) 

is the matrix form of the Fisher metric [1]. The mixed third derivatives of 
the KL-divergence dehne a pair of covariant derivatives on PQ. These give 
rise to notions of curvature of statistical manifolds, which are important in 
the theory of asymptotic statistics |1]. 

The literature on information geometry is dominated by the study of 
hnite-dimensional manifolds of probability measures such as {N,6). The 
reader is referred to BE] and the references therein for further information. 
In order to extend these ideas to inhnite-dimensions we need to choose a 
system of charts with respect to which the KL-divergence admits a suitable 
number of derivatives. It is clear from (jl]) that the smoothness properties 
of this divergence are closely connected with those of the density dQ/dP 
and its log (considered as elements of dual spaces of functions). In the se¬ 
ries of papers m Uni ESI E3], G. Pistone and his co-workers developed an 
inhnite-dimensional exponential statistical manifold on an abstract proba¬ 
bility space (X, A,/!). Probability measures in the manifold are mutually 
absolutely continuous with respect to the reference measure /i, and the man¬ 
ifold is covered by the charts sp{Q) = log dQ/dP—Ep log dQ/dP for different 
“patch-centric” probability measures P. These readily give log dQ/dP the 
desired regularity, but require ranges that are subsets of exponential Orlicz 
spaces in order to do the same for dQ/dP. The exponential Orlicz manifold is 
a natural extension of the hnite-dimensional manifold {N, 6) described above; 
it has a strong topology, under which the KL-divergence is of class C°°. 

However, this approach is technically demanding and leads to manifolds 
that are larger than needed in many applications. Furthermore, the ex¬ 
ponential Orlicz space is less suited to the theory of stochastic differential 
equations than Hilbert space; the latter is the natural setting for the 
theory of stochastic integration [7]. An inhnite-dimensional Hilbert mani¬ 
fold of “hnite-entropy” probability measures, on which the KL-divergence is 
twice diherentiable, is developed in [20] • This uses a chart involving both 
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the density, dP/dfi, and its log. The hnite entropy condition is natnral in 
estimation problems where the mntnal information between the estimand 
and the observation is hnite. Banach manifolds of hnite-entropy measnres, 
on which the KL-divergence admits higher derivatives, are developed in |2T]. 
That reference also develops Hilbert and Banach manifolds of finite measures 
snitable for the “nn-normalised” eqnations of nonlinear hltering. We shall 
make extensive nse of the Hilbert manifold of [20] in this paper; it is reviewed 
next. 

2.1 The Hilbert Manifold M 

For a probability space (X, X, fi), M is the set of probability measnres on X 
satisfying the following conditions: 

(Ml) P is mntnally absolntely continnons with respect to /r; 

(M2) < cx); 

(M3) E^logV < CX). 

(We denote probability measnres in M by the npper-case letters P, Q, etc., 
and their densities with respect to /i by the corresponding lower case letters, 

p, q, etc.) Let £°(X, X) be the set of real-valned random variables on X, and 

P^(X, T”,/!) the snbset of sqnare-integrable random variables. Let H be the 
Hilbert space of eqnivalence classes of centred elements of P^(X, X, qi) (those 
having zero mean), and let A ; P°(X, X) ^ H he dehned by 

A/ 9 f-EJ if f eC\X,X,fi), 

Af = 0 otherwise. (11) 

Let m, e : M —)■ P be dehned as follows: 

m{P) = Ap and e(P) = Alogp. (12) 

Variants of these are nsed in hnite-dimensional information geometry as co¬ 
ordinate maps for mixture a.nd exponential models, respectively [T]. However, 
in the present context their images m{M) and e{M) are typically not open 
snbsets of H |20|; so, even thongh they are injective, m and e cannot be nsed 
as charts for M. On the other hand the map ■. M ^ H dehned by their 
snm. 


0(P) = m(P) -f e(P) = A(p -I- logp). 


(13) 


is a bijection [20]. (M, 0) is a Hilbert manifold with an atlas comprising a 
single chart. 

Although not themselves charts, the maps m and e provide useful rep¬ 
resentations of elements of H since they are bi-orthogonal. The simplest 
manifestation of this property is the identity 


V{P I Q) + V{Q I P) = (m(P) - m(Q), e(P) - e{Q))H. (14) 

It thus follows from the non-negativity of the KL-divergence that 

||m(P) - mmin + \\e{P) - e(g)||?, < ||0(P) - 0(Q)||?,; (15) 


in particular, mo cf) ^ and e o 0 ^ are Lipschitz continuous. Furthermore 

V(P I Q) + V(Q I P) < 1||.^(P) - m)\\l- (16) 

The inverse map 0“^ : P —)• M is given by 

+ Z{a)), 

where 0 : M —)■ (0, cxo) is the inverse of the function (0, oo) 3 z z + log z G 
M, a is any function in the equivalence class a, and Z : P —)■ M is the 
unique function for which £^0(5 -1- Z{a)) = 1 for all a & H. Z is (Frechet) 
differentiable with derivative 


E^-0'(a -F Z{a))u 
E^0'(a -F Z{a)) ’ 


(17) 


where u is any function in the equivalence class u [20] . 

A tangent vector U at P G M is an equivalence class of differentiable 
curves at P. We denote the tangent space at P by TpM, and the tangent 
bundle by TM. The latter admits the global chart ^ : TM ^ H x H, where 
<h(P, P) = (a(0), a(0)), and (a(t), t G (—e, e)) is any differentiable curve in 
the equivalence class U (expressed in terms of the chart 0). If / : M — )■ E 
is a map with range Y (a Banach space) and the map / o 0“^ : P — )■ E is 
(Frechet) differentiable, then we write 


Uf ■= ^(/°</> ^)(a(g) 


Difocf) i)„M, 
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where (a, m) = ^{P,U) = (0(P),?70). A weaker notion of d-differentiability 
is defined in [20]. The map / : M —)■ y is d-differentiable if, for any P G M, 
there exists a continuous linear map d{f o : H ^ Y such that 

= d{f o (f)-^)aU, 

t=0 

for all differentiable curves a in the equivalence class U. We then write 

Uf = d{f O (j)-^)aU. 

The KL-divergence, V ■. M x M ^ [0,oo), is Frechet differentiable in 
each argument, and both derivatives are d-differentiable in the remaining 
argument [20]. We can use this fact, together with the Eguchi relations [9], 
to define the Fisher metric on TpM: for any P & M and U,V ^ TpM, 

(P, V)p := -UVV = + DZaU){v + DZav), (18) 

(1 +pY 

where U and V act on the first and second arguments of V, respectively, 
a = 0(P), u = U(j), V = V(f), it is any function in the equivalence class u, 
and V is any function in the equivalence class v. [TpM, ( • , ■ )p) is an inner 
product space, whose norm admits the bound: \\U\\p < ||m||_h-. However, 
since the Fisher norm is not (in general) equivalent to the model space norm, 
(TpM, {■, • )p) is not a Hilbert space. (M, (•, • )p) is a psewdo-Riemannian 
manifold rather than a Riemannian manifold. 

H-valued stochastic processes play a major role in what follows. In or¬ 
der to ensure that they have suitable measurability properties, we introduce 
the following additional hypothesis. This is satisfied, for example, if X is a 
complete separable metric (Polish) space, and X is its Borel cr-algebra. 

(M4) H is separable. 

Lemma 2.1. If (Z, Z) is a measurable space, and / : Z x X —)■ R zs jointly 
measurable, then the map 3 z ^ •) G FT is Z-measurable. 

Proof. Let R := {x G Z : f{z, ■) G £^(X, A,/!)} then, according to Tonelli’s 
theorem, B & Z. Fubini’s theorem shows that, for any g G C‘^{lL,X,p), 
the function h 3 z lp(z)E^/( 2 :, ■ ) 5 f G R is Z-measurable, i.e. the map 
Z 9 z I— )■ Af{z, •) G R is weakly Z-measurable. The statement of the lemma 
follows from (M4) and Pettis’s theorem. □ 


^(/o0 ^)(a(t)) 


10 




Under (M4), H is of countable dimension, and so admits a complete 
ortlionormal basis {rji G H,i G M). An element {P,U) ^ TM thus has 
the coordinate representation i,j G N) where a* = {(j){P), rji) h and 

= {U(j),rij)H. In particular, any U G TpM admits the representation 
U = u^Dj, where {P,Dj) = <h“^(0(P), r/j). For any P E M and i,j G N, let 

G(P)t,,~(D„Dj)p. (19) 

Then it follows from the domination of ||U||p by HmUp that, for any U,V E 
TpM, 

(U,U)p = G(P),,,W, (20) 

in the sense that both series are absolutely convergent, and the result does 
not depend on the order in which the limits are taken. 


3 The M-Valued Nonlinear Filter 


We consider a general nonlinear hltering problem as outlined in section [H 
in which all random quantities are dehned on a complete probability space 
(f2,P, P). The signal space X is a complete separable metric space, A is its 
Borel cr-algebra, and /i is a reference probability measure on X. (M, 0) is 
the associated Hilbert manifold, as described in section 12.11 We shall assume 
that X has right-continuous sample paths with left limits at all t G (0, oo), 
and that the distribution of Xt, Pt, has a density with respect to fi satisfying 
the Kolmogorov forward equation 

^ = AtPt for t G [0, cx)), (21) 

where {At, t > 0) is a family of linear operators on an appropriate class of 
functions / : X —)■ M. 


Example 3.1. X = {1,2, ...,m}, X is a time-homogeneous Markov jump 
process with rate matrix A, p is mutually absolutely continuous with respect 
to the counting measure (with Radon-Nikodym derivative r), and ht{= h) 
does not depend on t. In this case 


{Atp){x) 


{Ap){x) 


1 

r{x) 


Ai:^rp{x) 


for all t. 
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Example 3.2. X = R”*, X is a time-homogeneous multidimensional dif¬ 
fusion process with suitably regular drift vector b and diffusion matrix a, 
fi is mutually absolutely continuous with respect to Lebesgue measure (with 
Radon-Nikodym derivative r), and ht{= h) does not depend on t. In this case 

AtP = Ap= A a'^rp) - -^{Rrp) for all t. 

2r ox^ox^ r ox^ 

The set-up is sufficiently general to include path estimators. 

Example 3.3. X = C([0, cxd); R™), Xt = (X^, s > 0) for all t, where X is the 
diffusion process of Example \3.A and ht{x) = h^xf) for some h : R™ —)■ R'^. 
In this case .4.* = 0 for all t. 

Let (3^t C X, t > 0) be the hltration generated by the observation process 
y, augmented by the P-null sets of X, and let Vy be the cr-algebra of 
predictable subsets of [0, oo) x hi. We assume that: 

(FI) Po e M. 

(F2) For any T < oo, ¥.\ht{Xt)\‘^dt < oo. 

(F3) There exists, on the product space hi x X, a Vy x X-measurable, (0, cxo)- 
valued process (vr*, t > 0), for which 

P 7rt( •, x)p,{dx) = 1 for all t > 0^ =1. 

For any t and any A E A, P(Xi G 4 | W) = Lit(4), where 

114 ( 4 ):= [ TTti-,x)iJ,{dx). (22) 

JA 

(F4) P(7ri G Dom4.4 for all t > 0) = 1, the process (4,t7rt, t > 0) is Vy x X- 
measurable and, for any T < 00 , 

P ^E^(l + 7r4-')2(Avrt)2df < cx)^ = 1. (23) 
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(F5) For any T < oo, 


P 


P 

rT 


- htl'^dt < oo] = 1, 


E^{'Kt + lY\ht-ht\‘^dt <oo] = 1, 


where 


ht : = 


E^TTtht if E^TTtlhfl < cx) 
0 otherwise. 


(24) 

(25) 

(26) 


(F6) For almost all x, (vr*, t > 0) satishes the following Ito equation on hi: 


TTt = Po + / AtTs ds+ 7ls{hs - hsYdVs, 


where (z/^, f > 0) is the innovations process, 


vt ■.= Yt- / hsds. 


(27) 


(28) 


Remark 3.1. (i) Because of (F2), {ut, t > 0) is a d-dimensional (3^*)- 

Brownian motion JJ3i- 

(a) In the context of Example \S.l\ becomes a system of stochastic ordi¬ 
nary differential equations derived independently by Wonham fWf and 
Shiryayev In the context of Example \3.2\. it becomes a stochastic 
partial differential equation known as the Kushner-Stratonovich equa¬ 
tion ITBj . See m for a variety of conditions under which nonlinear 
filters admit the representation 

The intention here is to develop an M-valued representation for the pro¬ 
cess n of fl22l) . With this in mind, we introduce the following i7-valued 
processes 

Mi := A(17r“^)A7rt and Q := ^A\ht - ht\‘^, (29) 

where A is as dehned in flTTll . and the following L{W^, i7)-valued process 

Vt := A{7it + l){ht-ht)*. (30) 
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Proposition 3.1. Suppose that {X,Y) satisfies (F1-F6), and 11, u, ( and v 
are as defined in / dPI) and Then 

(i) P(ni G M for all t > 0) = 1; 

(a) (0(nt), t > 0) satisfies the following (infinite-dimensional) ltd equation 

= (j){Po) + [ {us-Cs)ds-^ [ Vsdvs. (31) 

Jo Jo 

Remark 3.2. (i) The first integral in / fgTl) is a Bochner integral, and the 

second is an ltd integral. The stochastic calculus of Hilbert-space-valued 
semimartingales is developed pedagogically in E- In the general case, 
the stochastic integral is defined for a Hilbert-space-valued Wiener pro¬ 
cess and the stochastic integrand is a Hilbert-Schmidt-operator-valued 
process. In the present context u is of finite dimension, and so, ifM.'^ 
is equipped with the Euclidean inner product, any element of L(R'^,H) 
is a Hilbert-Schmidt operator. For any p, a G L(Efi,H), the associated 
inner product is 

d 

{PW)hS '■='^^{P^kW^k)H, (32) 

k=l 

where (e^, 1 <k < d) is any orthonormal basis in 

(a) Although natural and inclusive, (F3-F6) are not particularly ripe. We 
develop some examples in which they are satisfied in section\^ 

(Hi) The case h = 0 provides an M-valued representation for the marginal 
distribution {Ft, t >0). 

Proof. According to (F6) there exists an F G A” with p{F) = 1 snch that 
7r( • ,x) satisfies fl27)) on hi for all x E F. Ito’s rnle shows that, for any snch 
X, {lit + logTTt, t > 0) satishes the following ltd eqnation on hi: 

7lt + hgTTt= Po + \ogpo +It +Jt, (33) 

where 

It ■= / {us-Cs)ds, Ut := 1 f{I + Ct :=-Ipfit - 

Jo ^ 

Jt := f Vsdus, Vt-.= lF{T^t + I){ht-ht)*. (34) 

Jo 
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The Fubini theorem and Cauchy-Schwartz inequality show that, for any T < 
oo, 


E. 


< I r ^Ti,(ii,-Qfdt\ . 


and so it follows from (1^ and (l2T|) that 

P {It e C\X, T’, /i) for all f > 0) = 1. 


(35) 


For any m G N and any T < oo, let 

Tm ■= inf{t > 0 : Kt> m} A T where Kt 



E^\vs\‘^ds. 


(36) 


According to fl25|l . F{Kt < oo) = 1, is a (3^t)-stopping time, and {Jt/^^^,t > 
0) is a continuous martingale on hi, for almost all x. According to the stochas¬ 
tic Fubini theorem (Theorem 4.18 in [7]), Jt^Tm is P®/i-measurable for each t, 
and so supjo j’] is ^-Iso P0yU-measurable. Applying Doob’s inequality 
and then integrating with respect to /r, we obtain 


E sup < 4E^ E = 4 E < Am. 

tG[0,T] 


Now Tm = T for all m > Kt, and so 


P I E^ sup < cxo I =1; 

te[o,T] 


(37) 


in particular, 

P(Jt G £^(X, A,/i) for alH > 0) = 1. (38) 

Now infyg(o,oo) 2/log|/ = -1/e, and so 7r/-hlog^7rt < {nt + log'Kt)'^ + 2/e. Part 
(i) thus follows from (FI), (l3^ . (l35ll and (l38ll . as does the fact that 

P((/(nt) = (t){Po) + Mt + AJt for all t) = 1. 

According to (F3), (F4) and Lemma 12.11 0(n), h, u, ( and v are all 
Py-measurable. Furthermore, it follows from (l23l - [25ll that 


P 


I '^s CsIIr/ ds 


■^sWhs 


ds < cxD for alH > 0 ) = 1, 
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and so the integrals on the right-hand side of are well dehned. (See 
section 4.2 in [7].) It remains to show that the operator A commntes with 
the operations of ordinary and stochastic integration in fl34|l . 

Let {at, t > 0) be the (continnons) if-valned process on the right-hand 
side of fl3T]) . let {rji, i G N) be a complete orthonormal basis for H and, for 
each i, let fji G fh, /r) be a fnnction in the eqnivalence class rji. It follows 

from the inhnite-dimensional Ito rnle (Theorem 4.17 in [7]) that 


{'ni,at)H = {rii, (j){Po))H + / {T]i,Us-Cs)Hds+ / {T]i,VsdUs) 


H 


= E^T]i{po -F logpo) + / E^r]i{us - Cs) ds+ dus, (39) 


where, in the second step, we have nsed the fact (derived from (1231123]) ) that 
Ut,Ct,Vty G for all y G M'^, for almost all {t,u)). The Canchy- 

Schwartz ineqnality shows that, for any m G N and any T < oo, 


E,, E 


\ 1/2 

\fiiV,\‘^dsj =E^|r)i|(E 


\ 1/2 

'dsj < ^/EKr^ < y/m, 


where and Kt are as dehned in fl36|) . and so, according to the stochastic 
Fnbini theorem (Theorem 4.18 in [7]) 


Es diy s E^TiiJt^Tm,- 

Since m and T are arbitrary, this is also trne if f Ar^ is replaced by any f > 0. 
According to fl37|l and the dominated convergence theorem A J is continnons; 
so 

E^PiVs dvs = E^piJt for alH > 0^ =1. 

Applying this and the Fnbini theorem to fl39|) . we obtain 




(hn at)H = E^?7j(po + logpo) + E^fjilt E^fjiJt 
= E^r)j(7rt + logTTt) 

= (hi,0(ni))H, 

where we have nsed fl33|) in the second step, and part (i) in the third step. 
This completes the proof of part (ii). □ 
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3.1 Quadratic Variation 

Suppose that (F1-F6) hold, and let ( 0 ( 11 *)*, z G N) be the coordinate rep¬ 
resentation for n* in terms of a complete orthonormal basis, (?]*, z G N), 
for H. Then 0(11*)* is equal to the right-hand side of fl39l) . The components 
{(0(n*)*, (V*), z = 1, 2,...} form a system of real-valued semimartingales with 
quadratic co-variations 



(40) 


We dehne the M-intrinsic quadratic variation of the process fl (its quadratic 
variation in the Fisher metric) as follows: 



(41) 


where G is as defined in (IT^ . and we have used (I2U]) and flTH]) in the third 
step, and 0171) in the final step. The final integrand here is the W-conditional 
mean-square error for the filter’s estimate of hs{Xs). The average value of 
the hnal integral is known to be related to the mutual information /(X; Yq) 
|8]. We refine this result here, characterising the W-conditional variant of 
this mutual information, as defined in (18|) . 

Proposition 3.2. For any 0 < t < s < oo, 


/(vn‘|y„‘) = lE(|n|.-in],|j>,), 


(42) 


where Yf is as defined in 0. 

Proof. Let 11([0, oo); X) be the Skorohod space of right-continuous, left-limit 
maps 9 : [0, oo) —)■ X, and let H : [0, s] x /1([0, oo); X) —)■ R'’* be defined by 




0 otherwise. 
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Fubini’s theorem and (F2) show that F{Hr{X) = hr{Xr) for all r G [0,s]) = 
1. Let p : [0, s] X X -D([0, cx)); X) —)■ [0, oo) be dehned by 

Pt{ ■ ,0) = exp {Hr{6) - JirYdBr + ^ y \Hr{0) - hrl'^dr^ , 


let S be the Borel a-algebra on D([0, cxd); X), and let Px G be the 

distribntion of X. Theorem 7.23 of |T3] shows that there exists a regnlar 
XqF conditional distribntion Px\Y^ : —)■ V{S), whose density with respect 
to Px is pt- (The hltration (Pt) of Theorem 7.23 is here Pt = V 3^*.) 

So, from (IH]), 


- ^ I 

PS 

\Hr{X) - hr\^dr\yt 


= -E 
2 

= -E 
2 



t J D{[0yOo)-,y. 


\hr{er)-ht\^PxiY;ide)dr\yt], 


which completes the proof. □ 

The solutions of stochastic differential equations driven by continuous 
semimartingales are typically non-differentiable. This is an important prop¬ 
erty of nonlinear Liters in continuous time. Over a short time interval [t, s] 
the observation process Y introduces the small quantity of new information 
fH2|) to the Liter. If the Liter process If were diLerentiable, then the Liter 
would “know” IIs to Lrst-order accuracy at time t, thereby contradicting the 
novelty of the information. Proposition 13.21 takes this intuition further by 
connecting the inLnitesimal information gain with the quadratic variation of 
the Liter. 


4 Finite-Dimensional Exponential Filters 

Let {N, 6) be the Lnite-dimensional exponential manifold outlined in section 
121 with the stronger conditions that < oo for all i, and exp(2Eil/*6) < 
oo for all y E G] let := A^i for i = 1,... ,n. Theorem 5.1 in [20] shows 
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that is a C'°°-embedded submanifold of M. We consider a nonlinear filter¬ 
ing problem, as developed in section El fulfilling (F1-F6) and the following 
additional hypotheses. 

(F7) F{Ut G N, for alH > 0) = 1. 

(F8) For any t and any P & N, there exists at least one measurable function 
p : X —)• (0, oo) lying in the domain of At for which dP/dfi = p. For any 
p with this property, p~^AtP G C‘^{X,X,p) and Ap~^AtP G span{^j}. 
For any p,p with this property p{AtP = Atp) = 1. 

(F9) For any t and any k, Ah^ G span{^j} and A\ht\^ G span{^j}. 

Let U, Vfc : [0, cxo) x iV ^ TiV be vector helds on N with e-representations 

Ue,t{P) := Ap-^AtP = ul^{P)^i (43) 

^e,k,tiP) ■= ^e,k,t = Ah\ = V*(44) 

where p is as in (F8), and ue^t{P) and ^re,k,t are the ^-representations. Since 
Ve,fc,t does not depend on P, Yk,t £ C°°{N;TN) for all t. We assume, further, 
that 

(FIO) Ui G C^{N;TN) for all t. 

Consider the following intrinsic Stratonovich equation on N: 



where is Amari’s —1-covariant derivative |T], and (Pt,t > 0) is a d- 

vector Brownian motion. 

Proposition 4.1. Suppose that {X,Y) satisfies (F1-F6), and (F7-F10) with 
respect to N, and that U and are as defined in ll^43^44\ )- Then (f73| ) has a 
strong solution 'L : x C([0, cxd); M'^) —)■ C([0, cxd); A^), and If = 4/(Po, u). 

Proof. As in the proof of Proposition 13.11 we can apply Ito’s rule to flTT)) to 
show that, for all x G P, (logvTi, t > 0) satishes the following ltd equation 
on fl: 

logTTt = logpo + / IpiTTf^AsTTs - (s) ds, + / Ipihs - KydUs, 

Jo Jo 
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where ( is as defined in fIM]) . This can be “lifted” to an if-valued equation 
in the same way that (15^ was lifted to flTT]) in the proof of Proposition 13.11 
The resulting equation is: 


Alla — Cs) ds+ Ah* di/g 

Jo 

ds) - ))ds+ / \e,k,sduay (46) 

Jo 

where We_t(P) (:= A|hj—Ephf p/2) is the e-representation of a time-dependent 
vector field W : [0, oo) x iV —)■ TN. For any t such that \ht\ G £^(X, A”,/!), 
Eph/ = (m(P), Ah/)p -|- E^h/, and so, according to Theorem 5.1 in [20] . 
Wt G C°^{N;TN) for all t. 

The Christoffel symbols for can be found from the Eguchi relations 

Ei: 

o3 

= - Ep6)(ei - Epe,)(L - Epe^), 

where is the (/,m) element in the inverse of the Fisher matrix in 

^-coordinates, g, as defined in (fTOj) . So 


-1 

S 


e(n,) = e(Po) + / (Avr, 

Jo 

= e(Po) -F [ (Ue,^(l 
Jo 




^?^™(P)Epv7,^,te - Ep6)v7,/7, - EpQiU - ^pL)di 
g^^{P)Ep{h>l - EphIfiU - ^pL)di 

g^^{P)Ep ((h/ - Eph’lY - Ep(h/ - Eph^f) iU - ^pUdi, 


and 

= 29'’"(P) (W,(P),S„.)pS,. 

k 

which shows that ^ G C°°{N;TN). So fH5|) has a strong 

solution, T. The fact that If = \h(Po,z/) follows from fH6|l . which is the e 
representation of fHSjl . □ 
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5 Examples 

5.1 An infinite-dimensional diffusion filter 

This example is developed from that in section 8.6.2 of [13]. The signal is 
a special case of the diffusion process of Example 13.21 in section [3l in which 
m = d = 1, a = 1, and b and h satisfy 

\b{x) \ + \b'{x) \ + \b"{x) \ + \b"'{x) \ + \h{x) \ + \h'{x) \ + \h"{x)\ < C, 

\b'"{x) — b'"{y)\ + \h''{x) — h''{y)\ < C\x — y\, (47) 

for all X, 1 / G M and some (7 < cx), Pq = -^(0, R) (the Gaussian measure with 
mean zero and variance P > 0), and y has density 2~^ exp(—|x|) with respect 
to Lebesgue measure. 

Proposition 5.1. The diffusion process {X,Y) defined above satisfies (FI- 
F6). 

Proof. (FI) is easily verified and, since h is bounded, (F2) is immediate. Ac¬ 
cording to Lemma 8.5 in [I3|, Xt admits the following A’*-conditional density 
with respect to Lebesgue measure: 

= j Eexp(P(x) - B{y) + Tt{x,y))n{yfi){x)n{Q,R){y)dy, 

where P(x) := b{y)dy, n{m, v) is the mean m, variance v Gaussian density, 

Tfix^y) := [\h - h){Xy/n di^s f {{h - kf + k + b') {Xy/k ds, 
Jo ^ Jo 

and (X|’*’*, s G [0, t]) is a Brownian motion on an auxiliary probability space 
(r2,P, P), pinned to the values y ai s = 0 and x at s = t. can be 

expressed in terms of a Brownian bridge process {Wf,s G [0,f]) as = 

sx/t + {t — s)y/t + Wf (See Gorollary 8.6 in [T^. NB. Equations (8.97) 
and (8.108) in [TB] contain some typographical errors, which are corrected 
in the above.) The d^t-conditional distribution of Xt thus admits the strictly 
positive density tt^ := vr^/r with respect to /r, where r = 2“^exp(—|x|). 

Theorem 8.7 in [13] shows that tt satisfies (F6). In particular, tt is (D^t)- 
adapted for each x and continuous in {t,x), and hence Vy x A-measurable. 
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It thus satisfies (F3). Equations (8.123) and (8.124) in [TB] enable the ex¬ 
plicit calculation of Atti-, straightforward calculations (involving integration 
by parts) show that 

{ATrt){x) = 7r / ^'yt{x,y)exp{B{x)-B{y)+Tt{x,y))n{yA){x)n{0,R){y)dy, 


2r 


where 


7t{x,y) := -6^(x)+ %) 


dTt dTt y 
dx dy R 

-b'{x) - b'{y) + + 2^-^ + 


, d'^Tt 1 


dx"^ 


R 


dxdy ' dy'^ 

(A proof of the existence and continuity of the derivatives of E is contained 
in the proof of Lemma 8.8 in [I3].) In particular, Att is (3^t)-adapted for each 
X and continuous in (t,x), and hence Py x A-measurable. The derivatives 
of r can be computed in closed form; for example 


dx 


-h'(X^’^’^)diys- / -((h-h,)h' + bb' + b"/2)(X^’^’^)ds. 


The joint density n(y, t)(x)n(0, R)(y) can be written in the x-marginal/j/- 
conditional form, n(atx, af)(y)n(0, R + t)(x), where at := R/{R + t) and 
cr^ := Rt/[R + t], and so 






i:exp(S(a;) - B{y) + Tt{x,y)) 

xn{atx,at){y)dy, (48) 

E-ft{x, y) ex^{B{x) - B{y) + rt(x, y)) 

xn{atx,a‘^){y)dy. (49) 


Since |6|, |6'|, \h\ < C, for any k eN and any x,y eE, 

E Eexp(A;ri(a;, I/)) < EEEt{x,y) exp({2k{k — 1)0“^ + k{C‘^ + C)/2)t) 

= exp{{2k{k-l)C^+ k{C^+ C)/2)t), (50) 

where H(a;, y) is the exponential martingale 

k^ ' 


(x, y) := exp ik {h - hJ(Xf*’*) dug - 


{h-hsy{Xf'^)ds 
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Furthermore, since |-B(?/)| < C\y\, 


j e^Y>{-kB{y))n{atX,(Tl){y) dy < j cosh.{kCy)n{atX, a^){y) dy 

= 2exTp{k‘^C‘^a^/2) cosh.{kCatx) 

< 2cosh.{kCx) exp(k‘^C‘^t/2). (51) 

Applying Jensen’s inequality to fHHj) . we obtain 

7i^{x) < (x) J Eexp{2{B{x) - B{y) + Tt{x,y)))n{atx,a^){y)dy, 

and so it follows from fl50|) and fl5T|) with k = 2 that 

EE^tTj^ < 4exp((7C'^ + C)t) j cosh^(2(Fx) ^^^’ (x) dx 

< Kt < oc for all t G [0,T] and any T < oo. (52) 

Together with the boundedness of h, this shows that (F5) is satished. Ap¬ 
plying Jensen’s inequality and the Cauchy-Schwartz inequality to fl4^ . we 
obtain 

E{A7rt)‘^{x) < J 

X ^EEexp(4(5(a;) - B{y) + Tt{x, y)))n{atx, a^){y) dy. 

It follows from the bounds in fITTD . and standard properties of the Lebesgue 
and ltd integrals that, for any x, r/ G M, 

EE'yt{x,yY <K{1 + t®)(l + 2/^) for some K < oo. 

Following the same steps as were used in the proof of (15^ (but using k = 4 
in fl5U]) and fl^ l we now conclude that 

E E^(^7rt)^ < Kt < oo for all t G [0, T] and any T < oo. (53) 

A further application of Jensen’s inequality to fl48|) yields 

vrt(x)“^ < p (x) / Eexp(-S(a;) + B{y)-Tt{x,y))n{atX,a'i){y) dy, 
n(0, R + t) J 
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which, together with shows that 

\'Kt{x)~^A'Kt{x)\ < ]- j+ Tt{x,y))n{atX,aj){y)dy 

X j Ee^p{B(y)-r,{x,y))n(a,x,a^){y)dy, 

and the bound, 

E (^TT^^ATitY ^ exp(i^(l + t)) for some K < oo, (54) 

easily follows. The bound fl2^ . for any T < oo, follows from fl5^ and (l5T|) . 
and this establishes (F4). □ 

5.2 A Kalman-Bucy Filter 

This is another example in which the signal is a diffusion process. Here 

b{x) = Bx, a{x) = A, h{x) = Cx and Pq = N{mo, Ro), where 5 is an m x m 

matrix, H is a positive semi-definite m x m matrix, C* is a d x m matrix, 
and Rq is a positive definite m x m matrix. The posterior distribution is 
Ht = N{Xt, Rt), where the mean vector, Xt, and covariance matrix, Rt, 
satisfy the Kalman-Bucy filtering equations |13j : 

Xt = txlq + [ BXg ds + [ RgC* dug (55) 

Jo Jo 

Rt = Ro+ {BRg + RgB* + H - RgC*CRs) ds. (56) 

Jo 

It is well known that such Gaussian measures belong to finite-dimensional 
exponential statistical manifolds. In order to apply the results of sections 
[3] and 01 we construct such a manifold as a G°°-embedded submanifold of 
where y has density 2“™'exp(— - |x-^|) with respect to Lebesgue 

measure. Let S be the set of symmetric positive definite mxm real matrices, 
and let n := m(m-|-3)/2. For any y G M"', let a{y) be the m-vector comprising 
the first m elements of y, and let I3{y) be the symmetric mxm matrix 
whose lower triangle contains the elements in some fixed 

arrangement. Then G := (a,/9)“^(M”^ x S) is an open subset of M"', and the 
map (a, jd) : G ^ M”* x S is a linear bijection. Let eiXxM^xM—)-Mbe 
defined by 

^ m 

e{x,y,z) := --x*l3{y)x + a{yyx + z'^\x^; 
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then Ae( ■ ,y,z) G e(M) for all {y, z) G G x M. Let := Ae( • , e^, 0), where 
(cj, 1 < i < n) is the coordinate orthonormal basis in M", let : = 

Ae(-,0, 1), and let N := e~^ o 7 (G x M), where 'y{y,z) := y^^i + z^n+i- 
is an n + 1-dimensional instance of the exponential manifold discussed in 
section m The n-dimensional submanifold N := e~^ o 7 (G x {!}) comprises 
all the non-singular Gaussian measures on 

Proposition 5.2. The diffusion process {X,Y) defined above satisfies (FI¬ 
FO), and (F7-F10) with respect to the n-dimensional submanifold N. 

Proof. (F7) (and hence (FI)) follows from the fact that Rt & S for all f; 
(F2) and (F9) are obvious. Straightforward calculations show that, for any 
PeN, 


— {x) = ^x*/3{y){Ap{y)+ 2B)x - a{yy{Ap{y) + B)x 

p 2 

+ ^(«(l/)*^«(2/) - tr(A/?(2/) + 2B)), 

where y = 0{P), and (F8) and (FIO) readily follow. (F3-F6) are easily 
verihed from (I55II56I) . □ 

5.3 Wonham’s Filter 

In this, X and A are as dehned in Example 13.11 of section [3l X is a Markov 
jump process for which ]P(Xo = x) > 0 for all x, and y is the uniform 
probability measure. M is itself an n (= m — l)-dimensional exponential 
statistical manifold. In the set-up of section HI appropriate choices are G = 
R" and := l{i} — n~^ l{i}- (Fl-FlO) are easily verihed. 

6 Concluding Remarks 

This paper developed information geometric representations for nonlinear bi¬ 
ters in continuous time, and studied their properties. Information manifolds 
are natural state spaces for the posterior distributions of Bayesian estima¬ 
tion problems where many statistics are required. They clarify information- 
theoretic properties of estimators, and their metrics are appropriate “multi¬ 
objective” measures of approximation error. The results also have bearing 
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on the theory of non-equilibrium statistical mechanics, in which rates of en¬ 
tropy production can be associated with rates of information supply [IH], and 
hence with the quadratic variation of a process of “mesoscopic states” in a 
particular pseudo-Riemannian metric. 

The development of approximations is beyond the scope of this paper. 
However, we conclude with a few remarks on this issue. One approach is to 
hrst dehne an appropriate differential equation (evolution equation) to which 
numerical methods might be applied. Equations fl^ and (H5|) are expressed 
in terms of the innovations process v in order to emphasise their information 
theoretic properties. Substituting for u in fin|) . we obtain an if-valued Ito 
equation for the nonlinear hlter in terms of the observation process Y : 


0(n,) = 0(Po) + 



Zs) ds + 



(57) 


where Zt := If h is bounded, then 2 ; and vck can be expressed in terms 

of the time-dependent, locally Lipschitz vector helds z, : [0, 00 ) x H ^ H, 
where 

Zi(a) = A ^\ht-Ephtl^ + {p + l){ht-Epht)*Ephi^ (58) 

Vfc,i(a) = A{p+l){ht-Epht), (59) 

and P = (j)~^{a). However, except in special cases such as the exponential 
hlters of section 01 u is more problematic since the infinitesimal character¬ 
isation of Pt in fl^ is dependent on the topology of the signal space X. 
The topology of M arises from purely measure-theoretic constructs, and is 
not dependent on the existence of a topology on X. ([ 20 ] assumes only that 

is a probability space.) This is quite natural in the context of 
Bayesian estimation and, in particular, nonlinear hltering: Bayes’ formula 
and Shannon’s information quantities are measure-theoretic in nature, as is 
the Markov property in its most general form (a property of conditional in¬ 
dependence). It may be possible to overcome this problem by strengthening 
the topology of M in some way (for example, by the use of Sobolev space 
techniques in the case of hlters for diffusion signals). However, this is not 
necessarily the best approach; for the purposes of approximation, it suffices 
to solve a simpler evolution equation for an approximate hlter. This idea is 
developed in [2], where approximations to H are constrained to remain on 
hnite-dimensional exponential statistical manifolds, on which projections of 
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the processes u, z and Vk can be represented in terms of locally Lipschitz 
vector fields. The manifold M contains a rich variety of smoothly embedded 
snbmanifolds, to which this method conld be generalised |20]. The selection 
of a good snbmanifold for a particnlar problem, together with a snitable 
coordinate system, wonld be critical to this approach. 

Hilbert-space-valned hltering eqnations based on the Zakai equation may 
be more snitable for these pnrposes. Manifolds of hnite (nn-normalised) 
measures, to which the Zakai equation could be lifted, are developed in [21]. 
These avoid the normalisation constant Z (a) in the computation of the den¬ 
sity. 

Another approach to the problem of approximation would be to switch 
to a discrete-time model “up front”, replacing fl by a time sampled ver¬ 
sion. This would replace the ltd equation fl57j) by a difference equation, on 
which approximations could be based. This would avoid the Kolmogorov 
forward equation replacing it by an integral equation (the Chapman- 

Kolmogorov equation) over each time step, and thereby eliminating problems 
concerning the topology of X. (In the case of diffusion signal processes, for ex¬ 
ample, the transition measure over a short time step could be approximated 
by an appropriate Gaussian.) 

Time reversal is used in imiiHiiini to construct dual hltering problems, 
in which the primal signal and hlter processes exchange roles. In the notation 
of this article, the dual hlter computes the process of posterior distributions 
for the primal hlter 11 (regarded as a dual signal) in reverse time, based on a 
dual observation process. Such posterior distributions take values in the set 
of probability measures on M. Since M is itself a complete, separable metric 
space, one can easily use the construction of section 12.11 to dehne a (dual) 
Hilbert manifold of such probability measures. However, a striking feature 
of the dual hlter is that it is parametrised by the primal signal process X, 
reversed in time. In this way, the topology of the primal signal space X is 
connected with the information topology of the dual problem. It may be 
possible to exploit this fact in hlter approximations. 

Notions of information supply and dissipation for nonlinear hlters are 
dehned in [TSl[Tn]. The supply at time t is the mutual information /(X; Yq), 
and the dissipation is the X^-conditional variant, E/(X; y)(|Xj). Modulo 
initial conditions, the supply of the primal hlter is the dissipation of its dual, 
and vice-versa [T9|. The quantity E/(X;Kg*|Xi) was studied in [Tl| in the 
context of hlters for dihusion processes, and shown to be connected with the 
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Fisher metric in the sense that 

E/(X; \X,) = ^jy (v log a (v log (X,) ds, 

where p and vr are the prior and posterior densities, and a is the diffusion 
matrix for the signal. The integral here is the average quadratic variation 
of the dual filter in the Fisher metric, and the integrand is the mean-square 
error for the dual observation function h'^{Xs,p) := (ct*V logp)(Xs), where 
cr is a matrix square-root of a [19] . 
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